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Abstract 

We derive sharp thresholds for exact recovery of communities in a weighted stochastic block 
model, where observations are collected in the form of a weighted adjacency matrix, and the 
weight of each edge is generated independently from a distribution determined by the com¬ 
munity membership of its endpoints. Our main result, characterizing the precise boundary 
between success and failure of maximum likelihood estimation when edge weights are drawn 
from discrete distributions, involves the Renyi divergence of order ^ between the distributions 
of within-community and between-community edges. When the Renyi divergence is above a 
certain threshold, meaning the edge distributions are sufficiently separated, maximum likeli¬ 
hood succeeds with probability tending to 1; when the Renyi divergence is below the threshold, 
maximum likelihood fails with probability bounded away from 0. In the language of graphical 
channels, the Renyi divergence pinpoints the information-theoretic capacity of discrete graphical 
channels with binary inputs. Our results generalize previously established thresholds derived 
specifically for unweighted block models, and support an important natural intuition relating 
the intrinsic hardness of community estimation to the problem of edge classification. Along the 
way, we establish a general relationship between the Renyi divergence and the probability of 
success of the maximum likelihood estimator for arbitrary edge weight distributions. Finally, 
we discuss consequences of our bounds for the related problems of censored block models and 
submatrix localization, which may be seen as special cases of the framework developed in our 
paper. 


1 Introduction 

The recent explosion of interest in network data has created a need for new statistical methods for 
analyzing network datasets and interpreting results [30, 13, 22, 16]. One active area of research with 
diverse applications in many scientific fields pertains to community detection and estimation, where 
the information available consists of the presence or absence of edges between nodes in the graph, 
and the goal is to partition the nodes into disjoint groups based on their relative connectivity [14, 
19, 33, 36, 26, 32]. 

A standard assumption in statistical modeling is that conditioned on the community labels 
of the nodes in the graph, edges are generated independently according to fixed distributions 
governing the connectivity of nodes within and between communities in the graph. This is the 
setting of the stochastic block model (SBM) [21, 39, 38]. In the homogeneous case, edges follow one 
distribution when both endpoints are in the same community, regardless of the community label; 
and edges follow a second distribution when the endpoints are in different communities. A variety 
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of interesting statistical results have been derived recently characterizing the regimes under which 
exact or weak recovery of community labels is possible (e.g., [27, 29, 25, 1, 2, 4, 17, 18, 40]). Exact 
recovery refers to the case where the communities are partitioned perfectly, and a corresponding 
estimator is called strongly consistent. On the other hand, weak recovery refers to the case where 
the estimated community labels are positively correlated with the true labels. 

In the setting of stochastic block models with nearly-equal community sizes and homogeneous 
connection probabilities, Zhang and Zhou [40] derive minimax rates for statistical estimation in 
the case of exact recovery. Interestingly, the expression they obtain contains the Renyi divergence 
of order ^ between two Bernoulli distributions, corresponding to the probability of generation for 
within-community and between-community edges. Hence, the hardness of recovering the commu¬ 
nity assignments is somehow captured in the hardness of inferring whether pairs of nodes lie within 
the same community or in different communities. This result has a very natural intuitive interpre¬ 
tation, since knowing whether each pair of nodes (or even each pair of nodes along the edges of a 
spanning tree of the graph) lies in the same community would clearly lead to perfect recovery of 
the community labels. On the other hand, this constitutes a somewhat different perspective from 
the prevailing viewpoint of the hardness of recovering community labels being innately tied to the 
success or failure of a hypothesis testing problem determining whether an individual node lies in one 
community or another [4, 29, 40]. Several other attempts have been made to relate the sharp thresh¬ 
old behavior of community estimation to various quantities in information theory [3, 10, 12, 4], but 
the precise relationship is still largely unknown. 

The vast majority of existing literature on stochastic block models has focused on the case where 
no other information is available beyond the unweighted adjacency matrix. In an attempt to better 
understand the information-theoretic quantities at work in determining the thresholds for exact 
recovery in stochastic block models, we will widen our consideration to the more general weighted 
problem. Note that situations naturally arise where network datasets contain information about 
the strength or type of connectivity between edges, as well [31, 9]. In social networks, information 
may be available quantifying the strength of a tie, such as the number of interactions between 
the individuals in a certain time period [35]; in cellular networks, information may be available 
quantifying the frequency of communication between users [8]; in airline networks, edges may be 
labeled according to the type of air traffic linking pairs of cities [7]; and in neural networks, edge 
weights may symbolize the level of neural activity between regions in the brain [34]. Of course, 
the connectivity data could be condensed into an adjacency matrix consisting of only zeros and 
ones, but this would result in a loss of valuable information that could be used to recover node 
communities. 

In this paper, we analyze the “weighted” setting of the stochastic block model, where edges are 
generated from arbitrary distributions that are not restricted to being Bernoulli. Our key question is 
whether the Renyi divergence of order ^ appearing in the results of Zhang and Zhou [40] continues to 
persist as a fundamental quantity that determines the hardness of exact recovery in the generalized 
setting. Surprisingly, our answer is affirmative. First, we show that the Renyi divergence between 
the within-community and between-community edge distributions may be used directly to control 
the probability of failure of the maximum likelihood estimator. Hence, as the Renyi divergence 
increases, corresponding to edge distributions that are further apart, the probability of failure of 
maximum likelihood is driven to zero. Next, we focus on a specific regime involving discrete weights 
(or colors), where the average number of edges of each specific color connected to a node scales 
according to 0(logn). In this case, we show that the bounds derived earlier involving the Renyi 
divergence are in fact tight, and exact recovery is impossible when the Renyi divergence between 
the weighted distributions is below a certain threshold. Our results are also applicable in the more 
general setting of more than two communities. Finally, we discuss the consequences of our theorems 
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in the context of decoding in discrete graphical channels and submatrix localization with continuous 
distributions. 

The remainder of the paper is organized as follows: In Section 2, we introduce the basic back¬ 
ground and mathematical notation used in the paper. In Section 3, we present our main theo¬ 
retical contributions, beginning with achievability results for the maximum likelihood estimator 
in a weighted stochastic block model with arbitrarily many communities. We then derive sharp 
thresholds for exact recovery in the discrete weighted case, and then interpret our results in the 
framework of graphical channels and submatrix localization. Section 4 contains the main arguments 
for the proofs of our theorems. We conclude in Section 5 with a discussion of several open questions 
related to phase transitions in weighted stochastic block models. 


2 Background and problem setup 


Consider a stochastic block model with K >2 communities, each with n nodes. For each node i, 
let a{i) G {1,2,... , K} denote the community assignment of the node. A weighted stochastic block 
model consists of a random graph generated on the vertices {1,2,..., nK}, using the community 
assignments a, as well as a sequence of distributions for 1 < ki,k 2 < K and 

n > 1. The support of the distributions may be continuous or discrete. In the discrete case, we 
will often use the terms weight, color, and label interchangeably. The weighted random graph is 
generated as follows: Each edge (i, j) is assigned a random weight ~ independent 

of the weights of all other edges. Such a stochastic block model is called non-homogeneous, since 
the distributions of the edge weights depend not only on whether the endpoints of an edge belong 
to the same community, but also on which communities they belong to. 

In this paper, we will consider a homogeneous weighted stochastic block model, which may be 
described simply as follows: Given a sequence of distributions {pn} and {qn}, every edge {i,j) is 
assigned a random weight IF(ij)j independently of all other edge weights, such that 


Pn if a{i) = a{j), 
Qn if (j{i) / a{j) 


The traditional (unweighted) stochastic block models constitute a special case of weighted stochastic 
block models, since we may encode edges with weights 1 or 0, corresponding to the presence or 
absence of an edge. 

Our ultimate goal is to infer the underlying communities based on observing the weight matrix 
W. Several differing notions of inference have been studied in the case of unweighted stochastic 
block models. In the “sparse regime,” where the distributions pn and Qn scale as 


Pn(0) = -—— 

n 

. ^ 1 - V' 

qn{0) = - - 

n 


Pn(l) 

9n(l) 


a 

5 

n 

6 

n’ 


and 


for constants a,b > 0, one cannot hope to recover the communities exactly, since the graph is not 
connected with high probability. The notion of “detection” or “weak recovery” considered in this 
regime consists of obtaining community assignments that are positively correlated with the true 
assignment. It has been shown in the case K = 2 that if 


{a — b)‘^>a + b. 


( 2 ) 
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it is impossible to obtain such an assignment^; whereas if 

(a — 6)^ < a + 6, 


obtaining a positively correlated assignment becomes possible [28, 25]. 

In order to obtain exact recovery, a simple necessary condition is that the graph must be 
connected, meaning the probability of having an edge must scale according to 12 This 

regime was considered in Abbe et ah [2], where the probabilities were given by 


Pn{0) 

9n(0) 


1 — alogn/n 
n 

1 — 6 log n jn 
n 


and 

n 

61ogn 
g„(l) =-, 


for constants a, 6 > 0. In this regime, it was shown [2] that exact recovery of communities is possible 
if 


y/a — Vb 


> 1 , 


and impossible if 



< 1 . 


Apart from exact recovery (also known as strong consistency) and weak recovery, a notion of partial 
recovery (also known as weak consistency) has also been considered [29, 5, 40]. This notion lies 
between the other two notions of recovery, and only requires the fraction of misclassified nodes to 
converge in probability to 0 as n becomes large. A very general result for the K = 2 case, character¬ 
izing when exact and partial recovery are possible for the unweighted homogeneous stochastic block 
model, is provided in Mossel et al. [29]. Zhang and Zhou [40] consider the problem of community de¬ 
tection in a minimax setting with an appropriate loss function, where the parameter space consists 
of both homogeneous and non-homogeneous stochastic block models, the number of communities 
may be fixed or growing, and the community sizes need not be exactly equal. In particular, for the 
case of homogeneous stochastic block models where the community sizes are almost equal and scale 
as they show that the loss function decays at the rate of whenever ^ —)• oo. 

Here, I is the Renyi divergence of order ^ between the two Bernoulli distributions corresponding 
to between-community and within-community edges. Furthermore, they show that exact recovery 
is possible if and only if the loss function is o{n~^), whereas partial recovery is possible if and only 
if it is o(l). The exact recovery bounds achieved in this way match those of Abbe et al. [2]. 

Heimlicher et al. [20] also conjectured that similar threshold phenomena should exist in the case 
of the stochastic block model with discrete weights. In particular, Heimlicher et al. [20] consider 
the homogeneous case where K = 2 and the between-community and within-community connection 
probabilities scale as 0 (^). Analogous to expression (2), they conjectured a threshold in terms of 
the discrete probabilities such that weak recovery is possible above this threshold and impossible 
below the threshold. The impossibility of reconstruction below the conjectured threshold was 
established in Lelarge et al. [24], and efficient algorithms that achieve weak recovery were provided 
for a constant above the threshold. 

In this paper, we consider the problem of exact recovery in the homogeneous weighted stochastic 
block model with K >2 communities. By dehnition, the estimator that minimizes the probability of 
erroneous community assignments is the maximum likelihood estimator: If the maximum likelihood 

^We appropriately modify the conditions to take into account that the community size in our setting is n, as 
opposed to n/2. 
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estimator fails to recover the communities with a certain probability, then the probability of error 
of any other estimator is also lower-bounded by the same probability. Thus, to show impossibility 
of recovery, it is sufficient to show that the maximum likelihood estimator fails with a nonzero 
probability. Finally, note that as in the unweighted case, the maximum likelihood estimator in the 
weighted case is easy to describe in terms of a min-cut graph partition [24]. Let £ be the class of 
edge labels, and let pn and Qn be distributions supported on £ which describe the probabilities of 
edge labels for within-community and between-community edges. For an edge with label £ £ £, 
we assign a weight of log . The maximum likelihood estimator then seeks to partition the 

vertices into disjoint communities in such a way that the sum of weights of between-community 
edges is minimized. 

3 Main results and consequences 

In this section, we present our main results concerning achievability and impossibility of exact 
recovery, along with several applications. 

3.1 Renyi divergence and achievability 

We begin with a result that controls the probability of success for maximum likelihood estimation 
under the general homogeneous model (1), when K = 2. Our first theorem relates the probability 
of failure of maximum likelihood to the Renyi divergence between the distributions for within- 
community and between-community edge weights. 

Theorem 3.1 (Proof in Section 4.1). Consider a stochastic block model with two communities 
of size n, with connection probabilities governed by the model (1). Then the probability that the 
maximum likelihood estimator fails is bounded as 

njl 

^{F) < ^ exp ^2k ^log ^ -|- 1^ — 2k{n — k)l'j , (3) 

k=l 

where I is the Renyi divergence of order ^ between the edge weight distributions Pn{x) and qn{x), 
given by 

-2 log [JZoVp n{x)qn{x)dx'j , for continuous distributions on M, 

—21og]^£>Q \/pni^)qni(), for discrete distributions on N. 

Note that the general exponential bound in inequality (3) decreases with /, which corresponds 
to the distributions pn and qn becoming more separated. This corroborates the intuition that the 
failure probability of maximum likelihood P(£) appearing on the left-hand side of inequality (3) 
should decrease with /, since the problem becomes easier to solve as the within-community and 
between-community distributions become easier to distinguish. 

Of course. Theorem 3.1 is particularly informative in regimes where we can show that the right- 
hand side of inequality (3) tends to 0, implying that the maximum likelihood estimator succeeds 
with probability tending to 1. To illustrate this point, we have the following corollary: 

Corollary 3.1 (Proof in Section 4.2). Suppose the Renyi divergence between pn and qn satisfies 







Then the maximum likelihood estimator succeeds with probability converging to 1 as n —)• oo. 

We will discuss the implications of Corollary 3.1 in various scenarios in the sections below. We 
also have a version of Theorem 3.1 that is applicable to the case of more than two communities. We 
state and prove the more general theorem separately, since the argument for K = 2 \s substantially 
simpler. 

Theorem 3.2 (Proof in Section 4.3). Consider a stochastic block model with K communities of size 
n, with connection probabilities governed by the model (1). Then the probability that the maximum 
likelihood estimator fails is bounded as 



(4) 


where I is the Renyi divergence of order 4 between the edge weight distributions Pn{x) and qn{x). 
In particular, if 

liminf .JfL- > ( 5 ) 

n^oo log n 

then the maximum likelihood estimator succeeds with probability converging to 1 as n ^ oo. 

The proof of Theorem 3.2 builds upon the arguments of Zhang and Zhou [40] and extends them 
to more general distributions. 


3.2 Thresholds for weighted stochastic block models 


In this section, we derive a threshold phenomenon for exact recovery in the case when pn and 
Qn are discrete distributions. Analogous to the scenario considered in [2], we now concentrate on 
the regime where the probability of having an edge scales as 0 However, in addition to 

Bernoulli distributions, our framework accommodates distributions on a larger alphabet, denoted 
by the set {0,1,..., L} for L > 1. Thus, instead of simply observing the presence or absence of an 
edge, we may also observe the corresponding color or weight of the edge. We define the distributions 
{Pn, Qn} as follows: For two vectors a = [ai, 02,..., a^] and b = [61, 62,..., 6 l] in define 


Pn{0) = 1 - 

rilogn 

and 

Pn{tj = 

ai log n 

VI < £ < L 

1 

n 

? 

n 

(?n(0) = 1 - 

v logn 

and 

Qnii) = 

be log n 

VI < ^ < L, 

1 

n 

? 

n 


( 6 ) 

(7) 


where u = ai and v = bi. We wish to determine a criterion in terms of a and b that 

describes when it is possible to to exactly determine the communities in this model. 

Our first result is the following theorem guaranteeing the success of the maximum likelihood 
estimator: 


Theorem 3.3 (Proof in Section 4.4). Suppose 

L 2 

e=i ^ 

Then the maximum likelihood estimator recovers the communities exactly with probability converging 
to 1 as n —>■ 00 . 
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We note that the expression on the left-hand side of inequality (8) is increasing in L, agreeing 
with the intuition that the exact recovery problem becomes easier when more edge colors are 
available: Given a graph with L edge colors, we may always erase certain colors to obtain a new 
graph with L' < L colors, and then apply a maximum likelihood estimator to the new graph. The 
probability of success of this estimator must be at least as large as the probability of success of a 
maximum likelihood estimator applied to the original graph; in particular, if 


V 


E 



> 1, 


(9) 


implying that maximum likelihood succeeds with probability converging to 1 on the graph with 
L' colors, the probability of success of maximum likelihood on the graph with L colors must also 
converge to 1. Indeed, inequality (9) implies inequality (8), since L' < L. Similarly, we may check 
that by the Cauchy-Schwarz inequality, the following relation holds: 




E I — E (v^ ~ 


£=i 


1=1 


This captures the fact that if the maximum likelihood estimator succeeds with probability con¬ 
verging to 1 on a graph with L colors when we replace all occurring edges with a single color, 
then the maximum likelihood estimator on the original graph should also succeed with probability 
converging to 1. 


Remark 3.1. Examining the proof of Theorem 3.3, we may see that it is not necessary for the 
number of colors L to be finite. Indeed, as long as we have 


OO 


E 




> 1, 


in the infinite case, we will also have liminf^^oo > 1; implying the desired result. 

As will be seen in the proof of Theorem 3.3 below, we have the characterization 

of the Renyi divergence. Hence, inequality (8) governs whether I < ^2^ or / > for large n. 
As will be illustrated in the computation appearing in the proof of Theorem 3.3, the inequality 
I > implies that the right side of inequality (3) tends to 0 as n —>• oo. On the other hand, the 
next theorem guarantees that if / < we have F{F) bounded away from 0. Hence, the success 
or failure of maximum likelihood occurs with respect to a sharp threshold that is encoded within 
the Renyi divergence. In the next theorem, we will make the additional assumption that 

ae,be>0, yi<i<L, ( 10 ) 

meaning the probabilities of all L colors are nonzero both within and between communities. 
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Theorem 3.4 (Proof in Section 4.5). Suppose the condition (10) holds. If 

L 2 

^ {yof^ - Vbi^ < 1 , 

1=1 

then for any K > 2 and for sufficiently large n, the maximum likelihood estimator fails with 
probability at least 

Viewed from another angle, Theorems 3.3 and 3.4 imply that the quantity (\/^~ 

determines a sharp threshold for when exact recovery is possible in the ill-community weighted 
stochastic block model; when the quantity is larger than 1, the maximum likelihood estimator suc¬ 
ceeds with probability converging to 1, whereas when the quantity is smaller than 1, the maximum 
likelihood estimator fails with probability bounded away from 0. Also note that the quantity is a 
sort of Hellinger distance between a and b, although a and b need not be the probability mass 
functions of discrete distributions, since their components do not necessarily sum to 1. 

Remark 3.2. The assumption (10) appears to be an undesirable artifact of the technique used to 
prove Theorem 3.4, which involves bounding appropriate functions of the likelihood ratio between 
within-community and between-community distributions. However, it appears that a substantially 
different approach may be required to handle the case when assumption (10) does not necessarily 
hold. Furthermore, note that our argument also requires the likelihood ratio to be bounded by some 
constant A4. Hence, although our impossibility proof continues to hold when L is infinite, we will 
need to assume a bound of the form 


sup < log 
£>o [ 



< M 


to establish the impossibility result when L is infinite. (Such a bound clearly holds for finite values 
ofL.) 


We also note that the results of Theorems 3.3 and 3.4 could be generalized further to include 
a mixture of discrete and continuous distributions. In other words, the distributions of Pn{x) and 
qnix) could follow arbitrary (discrete or continuous) distributions for the nonzero values, as long 
as 

p„(0) = 1 _ a„d ,„(0) = 1 - 

n n 

This reflects the fact that the graph is still fairly sparse, with average degree scaling as 0(logn). 
However, whenever two nodes are connected by an edge, the distribution of the corresponding edge 
may follow a more general distribution. 


3.3 Censored block models and graphical channels 

We now discuss the relationship between our results and the notion of graphical channels introduced 
by Abbe and Montanari [3]. Recall that a graphical channel takes as input a labeling of vertices on 
a graph, and each edge is encoded by a deterministic function of the adjacent vertices. The edges 
are then passed through a channel, and the output is observed. 

Abbe et al. [1] analyze a specific instantiation of a discrete graphical channel known as the 
censored block model. In this case, the node labelings are binary, and edges are encoded using the 
XOR operation on adjacent vertices. The channel is a discrete memoryless channel with output 





alphabet {*, 0 , 1 }, and for fixed probabilities p,qi,q 2 £ [ 0 , 1 ], the transition matrix of the channel 
is given by 

* 0 1 
OA-p p(l-gi) pqi\ 

1 -p p(l - q2) pq2) 


In other words, an edge is replaced by * with probability 1 — p, and is otherwise flipped with 
probability qi or 1 — q 2 , depending on whether the transmitted edge label is 0 or 1. Clearly, the 
observed graph may be viewed as a special case of the discrete model described in Section 3.2, with 
K = 2 and L = 2, where * represents an empty edge and the two “colors” are represented by 0 and 
1. This leads to the following result, a corollary of Theorems 3.3 and 3.4: 


Corollary 3.2. In the censored block model, suppose 


lim inf 

n^oo 


pn 

logn 



Vi - 92) + (vAi - 


> 1 . 


Then the maximum likelihood estimator succeeds with probability converging to 1 as n —)• 00 . On 
the other hand, if 


limsupj-^^ (Ai - gi - Ai - 92) r < 

n^oo AOg 77- \ / J J 


then the maximum likelihood estimator fails with probability bounded away from 0. 


Sharp thresholds were derived for the censored block model by Abbe et ah [1] and Hajek et 
al. [18] when K = 2 and = 1 — 52 = e, in the cases where e = ^ and e G [0,1], respectively. It 
is easy to check that their thresholds agree with ours. On the other hand, Corollary 3.2 does not 
require the graphical channel to flip edge labels with equal probability, and we may slightly relax 
the scaling requirement p x in the statement of our corollary. Furthermore, the theorems in 
Section 3.2 clearly hold for more general graphical channels aside from the channel giving rise to 
the censored block model; we may have more than two labels for each node, corresponding to a 
larger codebook, and the output alphabet of the channel may be arbitrarily large. Translated into 
the language of graphical channels, our results from Section 3.2 show the following: 


Corollary 3.3. Consider a graphical channel, where node inputs are binary and edges are encoded 
using an XOR operation. The edges are passed through a discrete memoryless channel that maps 
each edge to a discrete label I G {!,... ,L}, with probability for edges encoded with 0 and 


probability ” for edges encoded with 1, and erases edges with probabilities 1 — 

1 — respectively. Let I denote the Renyi entropy between the two output distributions. 

//liminfn^oo > 1, the maximum likelihood decoder succeeds with probability tending to 1. If 
limsup^^oo maximum likelihood decoder fails with probability bounded away from 0. 


As noted by Abbe and Sandon [4] in a slightly different setting, the threshold for reliable com¬ 
munication in a graphical channel is governed by a different quantity from the mutual information 
between the input distribution and the output of the channel, which arises from the analysis of 
channel capacity in traditional channel coding theory. This is because the encoding of the graphical 
channel is already built into the stochastic block model framework, rather than being optimized 
by the user. It is interesting to observe that Renyi divergence and Hellinger distance are the 
information-theoretic quantities that determine the “capacity” of graphical channels in the case of 
equal-sized communities. 
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3.4 Thresholds for submatrix localization 


The stochastic block model framework described in this paper also has natural connections to the 
submatrix localization problem, in which our more general framework involving arbitrary (discrete 
or continuous) distributions is useful in deriving thresholds for exact recovery. The goal in submatrix 
localization is to partition the rows and columns of a random matrix A € into disjoint 

subsets {Cl, ... ,Ck} and {Di, ..., Djc}, where and nj{ = Ylk=i^k- For each 

1 < k < K, the entries (i, j) G x Dk are drawn i.i.d. from a distribution G with mean > 0, 
and all other entries in A are drawn from the recentered distribution G — ^n- 

Chen and Xu [11] derive impossibility and achievability results for submatrix localization when 
jCfcl = Kl and jUfej = Kn] i.e., the row and column subsets have equal size. Furthermore, the 
distribution G is assumed to be sub-Gaussian with parameter 1. Chen and Xu [11] show that the 
maximum likelihood estimator succeeds with probability tending to 1 when 


2 Cl log n 

“ mm{KL,KR}' 


( 11 ) 


Furthermore, if G ~ I); the probability that maximum likelihood fails is bounded away from 

0 when 


- 12 


log(nij - Kr) log(ni, - Kl)\ 
Kl ’ Kr / ■ 


( 12 ) 


Specializing to the case when Kr = Kr = n, inequalities (11) and (12) imply the existence of a 
threshold at = 0 (j, although the value of the constant has not been determined precisely. 


When Kr = Kr = n, the results in Section 3.1 may be applied to obtain sufficient conditions 

under which the maximum likelihood estimator succeeds for the submatrix localization problem 

with probability converging to 1. We have the following result, which follows directly from Corol- 

2 

lary 3.1 and the computation / = ^ in the case when G rsj {l^rn !)• 

Corollary 3.4. Suppose Kr = Kr = n, and let I denote the the Renyi divergence of order ^ 
between the distributions G and G — pin- Suppose 

Til 

liminf - > 1. (13) 

n^oo log n 

Then the maximum likelihood estimator succeeds with probability converging to 1. In particular, 
when G ~ M{pin, 1), maximum likelihood succeeds if 

liminf > 4. (14) 

n^oo log n 

In particular, note that the condition (14) matches inequality (11), with a value for the specific 
constant. Furthermore, the sufficient condition (13) in Corollary 3.4 may be of independent interest 
in obtaining thresholds for a general version of the submatrix localization problem, where the 
remaining entries in the martrix are drawn from a distribution G' rather than a shifted version of 
G. For instance, if G ~ <7^) and G' ~ AA(/r(j, a'n), the sufficient condition for exact recovery 

in Corollary 3.4 becomes 


lim inf 
n—^oo 


ih-n Tn) 


+ log 




logn 

n 


where := . Although we do not yet have techniques for deriving impossibility results in 

the general submatrix localization setting, we conjecture that the upper bounds of Corollary 3.4 
based on the Renyi divergence may be tight here, as well. 
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4 Proofs of theorems 


In this section, we outline the proofs of the main theorems. Detailed proofs of the more technical 
lemmas are contained in the appendix. 


4.1 Proof of Theorem 3.1 

We first show that the result holds when pn and Qn are absolutely continuous with respect to each 
other. We provide a proof for the case when pn and qn are continuous distributions; the result for 
discrete distributions follows by replacing the integrals with summation signs. When and Qn are 
not absolutely continuous with respect to each other, we establish the theorem for the two cases 
(continuous and discrete distributions) separately. 


Define the function 


We have the following lemma: 


dnix) 



Lemma 4.1. Let the sets of vertices constituting the two communities he denoted by A and B. If 
the maximum likelihood estimator does not coincide with the truth, then there exist 1 < k < ^ and 
sets Aw C A and Bw C B such that \Aw\ = \Bw\ = k, and 


Aw) + S{Bw, Bw) < S{Aw, Bw) + S{Aw, Bw)- 

Here, Aw = ^ \ Aw, Bw = B \ Bw, and for disjoint sets of vertices A and B, 


(15) 


S{A,B):= ^ dn{wij). 

i&A,j&B 

Proof. Consider an assignment that is more likely than the maximum likelihood estimate. For this 
assignment, let Aw and Bw be the sets of misclassified nodes. Without loss of generality, we will 
assume that k = = \Bw\ < n/2. For disjoint sets of vertices A and 13, define 


Pn{A,I3)= PniWij), 

ieA,jGB 


and define qn{A,B) analogously. Since the new assignment is more likely that the truth, we must 
have 


Pn{Aw, Aw) PniBw, Bw) qn{Aw, Bw) qn{,Aw, Bw) ^ qn{l^w, Aw) qn{,Bw, Bw) Pn{,-^w, Bw) Pn{,Aw, Bw). 
Taking logarithms, this immediately implies that 

S{Aw, Aw) + S{Bw, Bw) < S{Aw, Bw) + S{Aw, Bw), 


completing the proof. □ 

Let F be the event that the maximum likelihood estimate does not coincide with the truth. For 
fixed sets Aw and Bw of size k, denote 

= F {S{Aw, Aw) + S{Bw, Bw) F S{Aw, Bw) + S{Aw, Bw)) ■ 


11 



By Lemma 4.1 and a union bound, we have 


»^/2 / V 2 

nn < E (:) 


k=l 


(16) 


Let {Xi}i>i be a sequence of i.i.d. random variables distributed according to pn, and let 
be a sequence of i.i.d. random variables distributed according to qn- For natural number > 0, 
define the expression 


Then 


T{N,Pn, qn, e) = P {dn{Yi) - dn(Xi)') > 6^ 


^2k(n—k) 2k(n—k) 

dn(Y^)- dn(Xi) >oj =T(2k(n-k),Pn,qn,0). 

i=l i=l 


(17) 


(18) 


Let Zi = dniYi) — dn{Xi). The moment generating function of Zi is then given by 


M{t) = E 




E 


3 — tdji{X.i) 


Pnjx) 

Qnix) 


qn{x)dxj ij 


Pnjx) 

qn{x) 


-t 


Pn{x)dx 


Let t* be the the point where M{t) is minimized for t > 0. We evaluate t* by differentiating M{t) 
and setting it to 0, as follows: 


M'{t) = 


Pnix)\ Pn{x) 


+ 


qn{x) 


log — -^qn{x)dx 
ln{x) ^ 

qn{x)dx 


qn{x) J qn{x) 

Pn{x'''' * 


Pn{x) 

qn{x) 


-t 


Pn{x)dx 


Pn{x)\ \ qn{x) , . \ 


Note that if we substitute t = 1/2 in the above expression, we obtain 


M'(1/2) = 


+ 


^Pn{x)qn[x) log dx 

qn\x) 


\/Pn{x)qn{x)dx 


\/pn{x)qn{x)dx 


VPn{x)qn{x) log ^Y\ dx 
Pn{x) 


= 0 . 


Since M(t) is a convex function, we conclude that t* = 1/2. Substituting, we then obtain 

2 


M{t*) = / ^ pn{x)qn{x)dx 


In particular, 

/ =-logM(t*) =-21og y/pn{x)qn{x)d: 

is the Renyi divergence dehned in the statement of the theorem. 


.X 
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By a Chernoff bound on the sum have 

( \ 2k(n—k) / poo _ \ ik{n—k) 

infM(t)J = ij y/pn{x)qn{x)dx] =exp{-2k{n-k)I). 

Using (^) < and substituting into inequality (16), we arrive at the bound 

^{F) < ^ (^) exp(—2/i;(n — k)I) = ^exp ^2k ^log ^ + l) “ 2/c(n — k)l'j , (19) 

k=l k=l 

which is exactly inequality (3). As noted earlier, the proof for absolutely continuous discrete dis¬ 
tributions follows exactly the same steps as above, and we will not repeat them here. 

We now turn to the case where pn and Qn are not necessarily absolutely continuous with respect 
to each other. 

Case 1: pn and qn are continuous distributions. Our strategy is to deliberately create a noisy 
version of the edges by adding a small Gaussian random variable to the existing edge weights, 
and then apply the maximum likelihood estimator to the new noisy graph. Naturally, this new 
estimator is worse than directly using a maximum likelihood estimator for the original distributions; 
however, the benefit of adding noise is that it makes the new distributions absolutely continuous 
with respect to each other. For some > 0, we write pn = Pn * AA(0, z^^) and qn = qn * A^(0, 
where -k represents convolution. Let the Renyi divergence between p and q be denoted by In- Using 
the argument for absolutely continuous distributions, we conclude that 

n/2 

IP’(-^) < ^ exp ( 2 k ^log ^ 1^ — 2k{n — k)In^ . (20) 

k=l 

We claim that limj^^o lu = k which implies the desired result. From van Erven and Harremoes [37], 
the Renyi divergence is uniformly continuous in (P, Q), with respect to the total variation topology. 
Hence, it suffices to show that 

lim ||p„-pnlli = 0, and lim - g„||i = 0. (21) 

u ->-0 

The proof of the above fact is standard and may be found in Theorem 6.20 of Knapp [23] or the 
lecture notes [6]. 

Case 2: pn and qn are discrete distributions. Similar to the case of continuous distributions, 
we deliberately create a noisy graph and use the maximum likelihood estimator on this new graph. 
We fix an e > 0 and assume, without loss of generality, that Pn(0), gn(0) > 0. We first replace every 
edge with weight 0 in the original graph by an edge with weight i, with probability A, for alH > 1. 
Thus, the new edge weight distributions are given by pn and qn where 

Pn(0) = Pn(0)(l - e), and p„(^) = pn{I) + , for £ > 1, and 

qnio) = Q'n(0)(l - e), and qn{£) = qn{£) + for £>1. 

Since pn and qn are absolutely continuous with respect to each other, we have 

nil 

f‘{F) < ^ exp (^2k ^log ^ “ 2A:(n — k)llj . (22) 

k=l 
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where is the Renyi divergence between pn and qn- It is easy to see that as e —0, we have 
\\Pn — Pn\\i ^ 0 and ||( 7 n — 9 n||i —^ 0. Again using the continuity of the Renyi divergence from van 
Erven and Harremoes [37], we conclude that 

lim /g = I, 

e —>-0 


which concludes the proof. 

4.2 Proof of Corollary 3.1 

Note that for sufficiently large n, we have / > (1 + for some e > 0. Substituting into the 

bound (3) of Theorem 3.1, we therefore have 


^{F) < ^ exp ( 2k ^log T + l) “ 2A;(n — /c)(l + e) j 
fc=i ^ ^ 

n/2 

= ^ exp (2/c (logn — log/c + 1 — (1 — A;/n)(l + e) logn)) 
k=l 
n/2 

= exp [2k (— log A: + 1 — (e — k/n — ke/n) log n)) 

k=l 
n/2 


k=l 


= ^ n exp ( —2k ( log A; — 1 — 


k log n ke log n 


n 


n 


We break up the summation into two parts: 


nl2 


k=l 


n exp ( —2k ( log A: — 1 — 


k log n ke log n 


n 


n 


k=l 


yy n exp f —2k flog A: — 1 — 


k log n ke log n 


n 


n 


n/2 


fc=3 


+ n exp ( —2k ( log A: — 1 — 


k log n ke log n 


n 


n 


For 3 < A: < S, we have 


log k — 


A: logn A:elogn , , A:(l + e)logn log A: 

-= log A;-> 


(23) 


n n n 3 

This is the because the function ig decreasing for x > 3, so we only need to verify that 

2 1 , ^ (l + e)logn 

— log A: >- 

6k n 


(24) 


holds at k = n/2. This is equivalent to checking that 


3n 


n 


log ( - ) ^ ^ ^ “ 


4log 2 ^ (1 + e) logn 


3n 


n 
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which indeed holds for sufficiently large n. Substituting the bound (24) into inequality (23), we 
then obtain 


nj’l 


k=l 

2 


n exp ( —2k ( log /c — 1 


k log n ke log n 


n 


n 


k=l 


< n exp ( —2k ( log k — 1 — 


k log n ke log n 


n 


n 


n/2 


+ y^n-2^"exp -2yfc -1 + 


k=3 


log k 


(25) 


The hrst term in inequality (25) may be bounded as follows: 
2 

k log n ke log n 


k=l 


yy n exp ( —2k ( log k — 1 


= n exp ( 2 + 
< Cn-2^ 


n re 

2(1 + e) log re 


re 


+ re exp ( —4 ( log 2 — 1 — 


2 log re 2e log re 


re 


re 


for a suitable constant C. For the second term in inequality (25), note that 

n/2 , . 1 7 \ \ 


y]]re"2^"exp(-2A:(-l + 


fc=3 


log k 


< re 


-e^y^expf 

fe=3 


( 2k log k 


< re 


-1 + 


-6e 


y^expf 

1 ^—Q V 


/ 2fc log k 


k=3 

3 

log k 
klog k 


-1 + 


log k 


+ re 


-6e 


exp( 

_fi I 1 ' 


/ 2fc log k 


k=e^-\-l 


-1 + 


log k 


<Cire-®" + re-6^ exp 

k=e^-\-l 

= 0(re-6^). 

Thus, we conclude that 

P(F) < C2re-2", (26) 

for a suitable constant C 2 , implying that P(T’) —)■ 0 as re —)• 00 . This concludes the proof. 


4.3 Proof of Theorem 3.2 


We will follow the arguments used in the proof of Theorem 3.2 of Zhang and Zhou [40]. 

We label the nodes {1, 2,... , nK}. Without loss of generality, suppose community i comprises 
the nodes {(i —l)re+l,... , in}, and denote the corresponding assignment mapping nodes to commu¬ 
nities by (jQ. Let AnKxnK be the adjacency matrix for the graph, where Aij G {0,..., L} is the color 
of edge (i, j). Just as in the K = 2 case, the maximum likelihood estimator for K > 2 communities 
seeks the partition that minimizes the weight of cross-community edges (equivalently, maximizes 
the weight of within-community edges), where the weight of an edge with color i G {0,1,... , L} is 
given by 


wi = log 


Pn{^) \ 

Qn{i) j ' 
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In other words, the maximum likelihood estimator a satisfies 

a = argmax^ Wj ■ l{Aij = = a{j)} := argmax^ r(fj). 

Note that the value of T{a) remains the same for permutations of a. To be precise, let A be the 
set of all permutations from {1,..., K} to {1,..., K}. For an assignment a, denote 

r((T) = {a : 3(5 G A s.t. a = 5 o a}. 

We may check that for all a' € r((T), we have T{a') = T{a). Thus, the maximum likelihood 
estimator finds the best equivalence class T such that any (T G T achieves the maximum value of T. 

From the equivalence class F, we pick a permutation a that is closest to ao in terms of the 
Hamming distance. Let us denote this assignment by (T(r); i.e. 

o-(r) G argmin^grC^/^(o'Wo), 

where d//((T, (Tq) denotes the Hamming distance between a and a^. We now define 
Pm ■■= P{3F : dH{a{T),ao) = m and r(fj(r)) > r((To)} • 

Let ¥{F) be the probability that the maximum likelihood estimator fails. Clearly, 

nK 

P(T) < Pm. (27) 

m=l 

Furthermore, we have the inequality 

Pm < |{r : dH{cT{T),ao) = m}\ ■ max P(r(fj) > T{ao)). 

{(T:d^((T,cro)=m} 

We will bound each of the terms in the above product separately. For the hrst term, we use the 
following lemma: 

Lemma 4.2 (Proposition 5.2 of Zhang and Zhou [40]). The cardinality of the equivalence classes 
F such that dH{(r{T),ao) = m is bounded as follows: 

|{r : dH{(T{T),ao) = m}\ < minj^ ^^ ^ . 

Suppose there exists an assignment a such that dH{cr,ao) = m and T{a) > T{ao). This is 
equivalent to 

= ^U)} = 4l{cro(f) = cro(j)}; 

or 


We ■ l{Aij = £} > ^ We - = ^}. 


Denoting 


7 = 


{(bi) : cr(f) = o-(i)wo(f) / cro(j)} 


and a = 


{(hj) : cr(f) / cr(j)Wo(f) = croU)} 
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we then have the bound 


F{T{a) > T{ao)) = P ( dn(hi) - dn{X,) > 0 

Vfi=l i=l } 

< inf 




i / /„NX -t 

\qn{t)) 


x £=0 


where as before, dn{t) = log |, and Xi ~ and Yi ^ qn, and we have used a Chernoff bound 

in the above inequality. Taking t = 1/2, we then arrive at 


P(r((T) > T{ao)) < 


(28) 


where I denotes the Renyi divergence of order ^ between pn and Qn- We then use the following 
lemma: 

Lemma 4.3 (Lemma 5.3 of Zhang and Zhou [40]). For 0 < m < nK, the minimum of a and 7 is 
bounded from below as follows: 


mm 


in(a, 7 ) > / 


n 

2 

n 


j nm — mf, if m < 

^ ifm>2- 

Substituting the bound from Lemma 4.3 into inequality (28), we obtain the upper bound 


{ {-nm+m?)! if ^ < n 
“ n 

e 9 if m > §. 


(29) 


Finally, substituting the results of Lemma 4.2 and inequality (29) into inequality (27), we arrive at 
the bound (4). 

Note that we have 


exp {—nm + m^)I < exp 


—2mn 


9 


-I 


for m < -^. In particular, the bound (4) may be relaxed to obtain a bound of the form 


P(F) < min < ( 

m=l 


for any m' < [^J. 


( enK‘^ 


2 \ 


m 


nK 


^ j^nK g(-nm+m )7 ^ ^ 

m=m'+l 


enK^ 


m 


2 \ m 


nK 1 


^ e"— 


(30) 


We now verify the sufficiency of inequality (5). Suppose that for some e > 0 and for all 
sufficiently large re, we have 

re/ 

- - > 1 + e. 

log re 

In particular, for rre' = [^J and m G {1,..., m'}, we have the bound 

(re - m')I > re (^1 - 0 / > re (^1 - 0 (1 + > (l + |) log re. 
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for small enough e and large enough re, implying that 


Thus, 


Pm. < 




m=l m=l 

oo 

< ^ (eA^ 

m=l 


2re-^/3 


2re-^/3 


< (eK^re-"/^) ^ (eiiT 


m=0 


2re-e/3 


< C'lre-"/^ 


where the last inequality follows because the geometric series converges for large enough re. 
For m e {m' + 1,..., nK}, we have the bound 


Pm < 


/ enK^ _2ni\^ ( enK^ 2ni 


■ 


-e 9 < 


m 


{ 


rre' + 1 


-e 9 


, / 2e 2 9n/ 

< I — K^e 9 


Note that for large enough re, we also have 


Hence, 

implying the bound 

nK 


2nl 2re(l + e)logre 2 log re 
~9' ^ "9 re ^ 9 


, 2e 9 _2 
Pm. < ( —K^n 9 


m=m'+l m=l 


2e 


2 —- 
re 9 


2e 


m=0 


2e 


<C2n-i. 


Therefore, using the decomposition (30), the total probability of failure is bounded by 

¥{F) < Cire-^/^ + C2re"^/^ 

which converges to 0 as re —)• oo. This shows that the maximum likelihood estimator succeeds with 
probability tending to 1 as re —>■ oo, as wanted. 
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4.4 Proof of Theorem 3.3 

Note that in this setting, we have 


I = —2 log 
= -21og( ( 1- 


uXogn 


n 


1-^ 1 I 1- 

u log n ( log^ n 


V log n 


n 


+E 


y/a(hi log n 


i=i 


n 


2n 


+ 0 




V log n ^ ( log^ n 
1 -+ 0 


2n 




( ulogn v\ogn >A\/^logn (log^n 
= -2 log ( 1-- ^ - + O 


2n 2n ^^ n 

\ i=i 

( u log n V log n Voibi log n f log^ n 

— 2 I :: :: y Cy 




2n 2n 


C log n ^ f log^ n 
-^ + O ' 


i=i 


n 




n 




where C = Yld=i{V^e ~ Vbe)'^- particular, 


+ E 


y/aibi log n 


e=i 


n 


(31) 


I = 



logn 

n 


+ 0 



Corollary 3.1 (for K = 2 communities) and Theorem 3.2 (for more than two communities) then 
imply the desired result. 


4.5 Proof of Theorem 3.4 

We will follow the proof strategy of Abbe et al. [2]. We will show that if 

L 2 

^ < 1, 

e.=i 

there with a probability of at least 1/3, we can find nodes i G A and j € B such that exchanging 
their community assignments has a larger likelihood than the ground truth. This would establish 
that the maximum likelihood estimator fails with probability at least 1/3. Although we will es¬ 
tablish the proof for the case of two communities, we note that the proof below trivially extends 
to A > 2 communities each of size n, simply by taking A and B to be any two fixed communities 
from the K communities. 


Let A = {1, 2,... , n} and A = {n + 1, n + 2,..., 2n}. For i ^ j, let Wij G {0,1,... , L} be the 
weight of the edge Just as in the case of unlabeled edges, maximizing the likelihood in the 

labeled case may be interpreted as finding the min-cut for the stochastic block model, where the 
weight of an edge with color i G {0, ... ,L} is log ^. For ease of notation, define the function 

(£i) ■ 
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We may describe dn explicitly as 


dn{0) = log log n/n | ^ ^ ^ ( 32 ) 

[1 —?;iogn/nJ \0i J 

Note that since (i,i(0) —)• 0 as n —)> oo, we may find a constant Ai > 0 that upper-bounds dn for all 
n. Thus, 


Ai > maxdn(i), for all n and all 0 < ^ < L. 
i 

For any node i and any subset of nodes H, denote 


S{i,H)= ^ dn{wij). 

Using an argument along the lines of Lemma 4.1, it is easy to check that if there exist nodes i £ A 
and j £ B such that 


5(i, ^ \ {i}) + 5(j, B \ {j}) < S{i, B \ {j}) + 5(j, A \ {i}), (33) 

then the community assignment where a(i) = B and a{j) = A and every other assignment remains 
the same is more likely than the truth. Thus, the maximum likelihood estimator will fail if this 
happens. Define the following events: 


F = maximum likelihood fails, 

FA = ^i£A : S{i,A\{i})<S{i,B)—At, 
FB = 3j£B : S{j, B \ {j}) < S{i, A) - Ai. 


We have the following simple lemma: 

Lemma 4.4. //P(Fa) > 2/3, then P(F) > 1/3. 

Proof. By symmetry, we have P(Tb) > 2/3, so by a union bound, P(i /4 n Fb) > 1/3. Thus, with 
probability at least 1/3, there exist nodes i £ A and j £ B such that 

S{i,A\{i}) < S{i,B) -Ai< S{i,B) -S{i,j) = S{i,B\ {j}), and 
<S{j,A) - Ai <S{j,A) -S{i,j) = S{j,A\{j}). 

This implies 


5(i, A\ {i}) +S{j,B\ {j}) < S{i,B\ {j}) +S{j,A\{j}), 
which from expression (33), implies that the maximum likelihood estimator fails. 

We now define 7 (n) and 6{n) as follows: 


□ 


7 (n) = (logn)'°g^”, and = 

log log n 

2 

Let F[ he a fixed subset of A of size We will take 7 (n) x (logre)^°®^ such that is an 

integer. Define the event A as follows: 


A = for all nodes i £ H, S{i,F[) < 6{n). 
We then have the following lemma: 
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Lemma 4.5 (Proof in Appendix A.l). P(A) > 

(i) 

Finally, define the events and Fh as follows: 

F^^ = node i ^ H satisfies S{i, A \ F[) + 5{n) < S{i, B) — Ai, 
Fh = ^i&HF^\ 


and define 

p(n)=p(F«). (34) 

We have the following result: 

Lemma 4.6. If p{n) > then F{F) > 1/3 for sufficiently large n. 

Proof. We first show that P(-Fff) > ^ for large enough re. Since the events are independent, 
we have 

P(Fh) = P = 1 - P = 1 - (1 - p{n))^. 


Clearly, if p{n) is not o(l), then P(F) tends to 1 and we are done. If p{n) is o(l), then 


lim (1 — /?(re)) 

n—foo 


P{'ri)r. 


lim (1 — p{n))p<.‘^'i T'('’ 


lim exp 

n—¥oo 


p{n)n\ 

7 (re) J 


< 


10 ’ 


where the last inequality used the fact that p(n) > Hence, F{Fh) > ^, as claimed. 

Now note that A n Fh F Fa- By Lemma 4.5, we also have P(A) > Hence, 

P(Fa) > P(A) + P(Fh) - 1 > ^ > I 

which combined with Lemma 4.4 implies the desired result. □ 


Let {Xi}i>i be a sequence of i.i.d. random variables distributed according to pn, and let 
be a sequence of i.i.d. random variables distributed according to qn- From the definition (34) of 
p{n), and using independence, we have 


~f(n) 


/9(re) = P ^iin(Fi)- ^ dn{Xi) > 5{n) + M 


. ^=1 
/n— 


7(n) 


2=1 

n— 


7(n) 


- ^ X] rfn(h/) - ^ dn{Xi) > 5{n) +M- 5{n) 


2=1 


2=1 


1 X P 

( - 

E 

\ 

dniYi) > Sin) 

/ 




for any 5{n). We will choose a suitable 5{n) so that 

n 

Y, dn{Yi) > 6{ 


P 


> 6{n) —> 1. 


(36) 


i=n -p.^+1 

' 7(n) 
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Note that dniXi) is a random variable satisfying 


P dn{Yi) = log 


1 — u log n/n 
1 — V log n/n 


= 1 - 


nlogn 


n 


Thus, 

P (^dn{Yi) = log 
We may check that 

implying that 


1 — u log n/n 
1 — u log n/n 


, for all n — 


n 




— l<i<n\ = [l — 


V log n \ T(n) 


n 


1 - 


V log n \ 


/ 


Thus, equation (36) holds with 


n 


n 


log 


1 — n log n/n 


7 (n) (1 —ulogn/n 


5{n) = 


n 


■ log 


1 — n log n/n 


7 (n) (1 —ulogn/n 


(37) 


Since 




we have S{n) + M — 5{n) = o(Vlogn). 

Recall the definition of the function T in equation (17). We have the following technical lemma: 
Lemma 4.7 (Proof in Appendix A. 2). Let ui{n) = o(\/logn) and N(n) = n(l + o(l)). Then 

-logT {N{n),pn,qn,u}{n)) < ~ V^) ^ logn + o(logn). 

Noting that 


/ \ I 7 (") 7 (") 

T ^^^,Pn,qn,d{n) + M - 6{n)j =F i E dn{Yi) - E dn(Wj) > (5(n) + - (5(n) 


and using Lemma 4.7, we conclude that 


— log T n — 


-:^,Pn,qn,d{n) + M-5{n'^ < ^E 


ai-yhi) logn + o(logn). (38) 


Substituting the bounds (36) and (38) into equation (35), we then conclude that 


-logp(n)< ^E (y^i- a/^) ^ logn + o(logn). 
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In particular, when 



£=1 


we have 


— log p{n) < log n — log 7 (n) — log log 10, 

for sufficiently large n. Lemma 4.6 then implies that maximum likelihood fails with probability at 
least completing the proof of the theorem. 


5 Discussion and open questions 

We have established thresholds for exact recovery in the framework of weighted stochastic block 
models, where edge weights may be drawn from arbitrary distributions. Whereas previous inves¬ 
tigations had concentrated on the setting of unweighted edges, we show that the same techniques 
may be extended to the weighted case. Furthermore, the Renyi divergence of order ^ between the 
distributions of edges coming from within-community and between-community connections arises 
as a fundamental quantity governing the hardness of the community estimation problem. 


The conclusions of this paper leave open a number of open questions regarding phase transi¬ 
tions in general weighted stochastic block models. We conclude our paper by highlighting several 
interesting directions for future research. 


• Thresholds for exact recovery under continuous distributions. Although the error 

bound for maximum likelihood derived in Theorem 3.1 does not impose any conditions on 
the distributions pn and qn, the proofs of the upper and lower bounds in Section 3.2 assume 
a specific setting involving discrete distributions with the same support. However, situations 
may arise where the observed edge weights are generated from continuous distributions. The 
submatrix localization problem in Section 3.4 provides one such example. It would be inter¬ 
esting to see if the Renyi divergence between pn and qn again plays a role in characterizing 
the threshold for exact recovery in the continuous case. However, a number of hurdles ex¬ 
ist in extending our proof of impossibility to continuous distributions. Just as with discrete 
distributions, our proof technique does not allow for distributions that are not absolutely con¬ 
tinuous with respect to each other. Furthermore, we have assumed the existence of a finite 
upper bound A4 on the likelihood ratio between pn and qn- Such a bound may not exist even 
for absolutely continuous distributions; for example, no such bound exists for pn = 1) 

and qn = AA(0,1) in the submatrix localization problem. Finally, the emergence and relevance 
of the Renyi divergence term as a sharp threshold in this problem may be attributed in part to 
the specific regime we have considered, where the probabilities of connection scale according 
to 0(logn/n). Mossel et al. [29] have shown that for Bernoulli distributions pn and qn in 
slightly denser regimes, where the probabilities scale according to 0 , the threshold is 

no longer simply a function of the Renyi divergence. 

• General thresholds for weighted distributions. Mossel et al. [29] derive a very general 
theorem involving thresholds for the binary stochastic block model when K = 2. Defining 


P{n,pn,qn) = P E>-.£E a:* 


\i=l 


2 = 1 


(39) 
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where X ^ pn and Y ^ qn, and pn and Qn are Bernoulli distributions such that pn stochas¬ 
tically dominates Qn, Mossel et al. [29] prove that exact recovery of the two communities is 
possible if and only if P{n,pn,qn) = o (i). On the other hand, there exists an estimator for 
which the fraction of misclassified nodes converges to 0 if and only if P{n,pn,qn) = o(l). It 
would be interesting to derive such a statement when pn and Qn are general distributions, 
which could then be used to prove our results in Section 3.2 as a special case. Specifically, 
one might construct the analog of expression (39) to be 

( n n 

'^dn{Yi) - '^dn{Xi) > 0 
i=l i=l 

and conjecture analogous results about exact and partial recovery based on the rate at which 
P{n,pn,qn) converges to 0. 

• Efficient algorithms for exact recovery in weighted stochastic block models. Hajek 
et al. [17, 18] and Gao et al. [15] provide efficiently computable algorithms that achieve 
the threshold for exact recovery in the case of binary stochastic block models. Now that 
we have characterized the threshold for a more general class of weighted distributions, it 
would be interesting to see if similar efficient algorithms may be derived to obtain community 
assignments in the weighted case. 



A Proofs of technical lemmas 

In this section, we collect the proofs of the more technical lemmas used in proving the main results. 


A.l Proof of Lemma 4.5 

Let Aj be the event S{i,P[) < 6{n). By a simple union bound calculation, we have 
P(A) = 1 - P(A‘=) = 1 - P >l-\H\- P(Af). 


We will show that 


by showing that 


|i7|-P(Af) = o(l), 
log \H\ + logP(Af) —)■ —oo. 


as n ^ oo. Let the weights of the edges from i to nodes within H be the random variables 
{Al,... ■,X\ij\_{\. Note that the Aj’s are independent and identically distributed according to pn- 
We have 


P(Af)=P 5(i,F)> 


Vlogn 
log log n 


\H\-1 


= P 



dn{Xi) > 


Vlogn 
log log n 


t-yiog n 
g log log n 


< inf 
f>0 


using a Chernoff bound in the last inequality. Thus, for t > 0, we have 


n E [e*'^"(^i)] 7(") 

log |iL| + logP(Af) < log ^ ^ + log ^ ^ 


-1 


7(n) 


t\/log n 
g log log n 


= log 


n 


+ 


7(n) V7(n) 


n 


— 1 logE 


Jdn(Xl) 


ty/log n 
log log n 
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Picking t = ^/\ogn log log re, the last expression simplifies to 


log7H+ logE 


,,x/Iogn(log log n)dn (Xi) 


We now analyze logE 


3 Viog ^(log log n)dn {Xi) 


logE 


o Vlog fi(log log n)dn {Xi) 


= log 


carefully. Note that 

/ l-relogre/re \^^‘°g^°g" / _ re log re 
\ 1 — re log re/re / \ re 

\/log n log log n 


e=i ^ 


ai log re 


re 


where 


1 “ 1 “ — 


:=log(l+//„ + re„), 

l-relogre/re \ v^iogiogr 
1 — re log re/re y 


1 - 


re log re 


re 


and 


rer,, = 




i=\ 


Plogn log log n / 1 

, ^ ^ log re 


re 


The following bound holds for 

L 

^n. — 


^ (log re) 

1=1 

for suitable constants C'i,C' 2 - For we have 
l^n — 


v^I^iogli f ailogn\ (logre)‘^ 2 \/I^ 


re 


<^11 


re 


1 - re log re/re \ vTiiTloglogn . ^ ulogn _ ^ 
log n/n J 


1 — re ] 


l-relogre/reV/^°s^ 


(log log log T 


1 - 


re log re 


re 


- 1 . 


(40) 


1 — relogre/re^ 

The term f tends to a constant, exp(re —re). Thus, for large enough re, we may find 


constants 0 < ci < C 2 such that (^^n^Hogn^) ^ (ci,C 2 ). Using the Taylor series approximation 

of cf near 0, we have 


(iogTi)3/2 log iog„ (log re)^/^ log log re 


= 1 + 


re 


log Ci + O 


(log re)^/^ log log re 


re 


so 


c- 


(logn)3/^loglogn / ^ jQg \ J^g J^g ^ 


1 - 


re 


- 1 = 


re 


log Ci + O 


(logre)^/^ log log re 


Ti¬ 


re log re (log re)^/^ log log re , 

- I 1 H-log Ci 


re 


re 
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Thus, for large enough re, there exists a constant C3 that satishes 




C's log^ n 
n 


Using the bound 


log(l + Hn + I'n) < l/^nl + 


we conclude that 


logE 


gvTjgTllog log n)dn(Xi) 


<c\- 


, (log re; 


iCaVTogn 


re 


for a suitable constants C[ and Returning to the expression ( 40 ), we conclude that 


re 


- ^ogj{n) + ( - 1 ) logE 


gVlog n(log log n)dn (Xi) 


< -log7(re) + 
2 

Substituting 7(71) = (logre)'°s^”, we arrive at the upper bound 


^ 1 1 (logre)'"2^^°®"' 

— 1 Cl 


7(re 




re 


— logs re (log log re) + 


re 


, (log re)*°s^' 


- 1 


/ (log n 


iC^Vlogn 


re 


It is easy to check that as re —>■ 00, we have 


re (logre)'"2^^°§" 

,(logre)'°g^’^ / 


re 


0 , 


and 

This concludes the proof. 


— logs re(loglogre) —00. 


A.2 Proof of Lemma 4.7 

We will use the proof strategy found in Zhang and Zhou [ 40 ]. Let 

Z = dn{Y)-dn{X), 

where A ~ and Y ~ q^- Let M{t) = Ee^^, and recall the following results from the proof of 
Theorem 3 . 1 : 

t* = argminM(t) = -, 


Min= Y1 VPnie)qnii) , 


\£=0 


/ = - log M (t*) = -2 log (^ Vpn{^)qn{^)\ 

\e=o / 
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Let Sn = where the Zj’s are i.i.d. and distributed according to Z, and denote the 

distribution of Z hy pz- Define 

3 

r]{n) = log4 n. 

Then 

N{n) 


'{Sn > w(n)) > X] n 

2 : 5 jvS[aj(n),f 7 (n)) i=l 


> 


pt*r){n) 


N{n) 


E n 


e^*^^pz{zi) 


= exp —N{n)I — 


2;SjvS[a;(n),r;(n)) i 

p{n) 


y M(t*) 


N{n) 


e^*^*pz{zi) 


^ n M(t^) 

2:SjvS[a;(n),?7(n)) i=l 


(41) 


where the second inequality uses the fact that > e**^*^* when Zi < p{n). 

Now denote r{w) = ^ ^ defines a probability distribution. Defining 

ld^ 2 ; • • •, Wn to be i.i.d. random variables with probability mass function r{w), we then have 

(42) 


N{n) / N(n) 

E n ^ 4 ^<’>(») 


->■-1- Mit*) 

i=l 


i=l 


2:Sjve[aj(n),?7(n)) 

We also have the following concentration result: 

Lemma A.l (Proof in Appendix A.3). Let {VLj}i>i be i.i.d. random variables distributed as r{w). 
Then 


y/\ogn 


AA( 0 ,i/ 2 ), 


as n ^ oo, where u > 0 is a constant. 
By Lemma A.l, it follows that 


N{n) 

, y Wi 4 AA(o, 4), 

y^logA(n) ^ 

for some constant ly > 0. Furthermore, by our choices of uj{n), N{n), and r](n), we have 

w(re) r/(n) 


V^log A(n) 


0, and 


Thus, 


I 


V^log A(n) 
p{n) 


+ 00 . 


N{n) 

, ,_< , y Wi < ,_ 

yV^og N{n) y/log N{n) ^ y^log N{n) 


1 / 2 , 


implying that the left-hand probability expression becomes larger that 1/4 for all large enough n. 
Combining this with the bounds (41) and (42), we then obtain 

^{Sn > ^{n)) > exp (—N{n)I — ^ — log4^ . 


27 























Now recall the computation in equation (31). Using N = n(l + o(l)), we arrive at 
-logT {N{n),pn,qn,(^{n)) = - logF{SN > uj{n)) < ({y/ai - Vbj) ^ logn + o(logn). 


e=i 


This concludes the proof. 


A.3 Proof of Lemma A.l 

We show that the moment gener 
variable. By a simple computation, we may check that r is a sum of delta distributions with mass 


\Y- 

We show that the moment generating function of converges to that of a normal random 


Pn{y) , Pn(.x)\ _ y/Pn{x)qn{x)Pn{y)qn{y) 


2 ’ 


(43) 


at the point log — log , for all 0 < x, y < L. Note that the right-hand side of equation (43) 
is symmetric with respect to x and y, implying that r is a symmetric distribution. For x,y ^ 0, we 
then have 




qn{y) qn{x)j (^l 


'\/log 0 ^ ^og n 


2 ^2 


VPn{(-)qn 

For X = 0 and y ^ 0 (and by symmetry, for y = 0 and x 7 ^ 0), we have 

Piy) , P( 0 )\ _ V (1 -'«logn/n)(l - xlogn/n)aj^ 6 j/ logn 

(EUv^W)’ ' - 

Cy log n / log^ n 


n 


n 


+ 0 




for a suitable constant Cy > 0. Hence, 




n 


n 


2 / ’ 


for some constant Cq > 0 . 

We now examine the range of fF = Wj, which we denote by the set >V. Note that the range 
is finite, since W can only take values from set |log • 0 < x,y < a|. Also note that 

the range depends on n, since the ratio log ^ ^”[ 0 ] ^ changes with n. However, since log ^ = 

O this dependence may only perturb the range by O Thus, we may fix constants 

{0, irci,..., ±rci{} such that the range of W is given by 


W = {0, ±u)i,..., Tr&ij} where Wi = wi + O 


log n 


n 


, for 1 < i < i?. 


Since VF is a symmetric random variable, it is easy to see that its moment generating function is 
given by 

R 2 

= 1 + Y1 , (44) 

i=i 
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using the fact that r(0) = 1 — '^f=i‘^r{wj). As noted above, for certain nonzero w G W, we 
have r{±w) = 0 whereas for other values, we have r{w) = O ^^ ■ Without loss of 

generality, let r{wj), for 1 < j < A^, be 0 , and let r{wj), for N + 1 < j < R, he O ^^ ■ 

We then write r{wj) = ^ + O ^^, for 1 < j < A^. Using the expression (44), the moment 

generating function of W is then given by 


= 1 + 

^<j<N 




n 




+ E o 

N+l<j<R 


log^ n 




^twjl2 _ —twjl2 


Substituting Ygn ™ P^^ce of t and using the approximation — a = x log a + 0(x^ log^ o) 
for X = o(l), we arrive at 

Cj log n / log^ ''^W f 


l<j<N 


n V ^ // VVlogn Vlogn 


+ 


N+l<j<R 

^ Cf 

= 1 + - + 0 

n 


E o('^)( 


tWi 




+ 0 


n 


logn 


for a suitable constant C, where the second equality uses the fact that Wj = Wj + O , for all 

E" W‘ 

1 < j < R. Hence, the moment generating function of J'hogn' given by 




, , Ct^ (I 
= 1 +-+ 0 - 


n \n 




which is the moment generating function of AA(0, 2C). This completes the proof. 


References 

[1] E. Abbe, A. S. Bandeira, A. Bracher, and A. Singer. Decoding binary node labels from censored 
edge measurements: Phase transition and efficient recovery. IEEE Transactions on Network 
Science and Engineering, 1(1):10~22, 2014. 

[2] E. Abbe, A. S. Bandeira, and G. Hall. Exact recovery in the stochastic block model. arXiv 
preprint arXiv: 1405.3267, 2014. 

[3] E. Abbe and A. Montanari. Conditional random fields, planted constraint satisfaction and en¬ 
tropy concentration. In P. Raghavendra, S. Raskhodnikova, K. Jansen, and J. D. P. Rohm, ed¬ 
itors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Tech¬ 
niques, volume 8096 of Lecture Notes in Computer Science, pages 332-346. Springer Berlin 
Heidelberg, 2013. 


29 
























[4] E. Abbe and C. Sandon. Community detection in general stochastic block models: Funda¬ 
mental limits and efficient recovery algorithms. arXiv preprint arXiv: 1503.00609, 2015. 

[5] A. A. Amini, A. Chen, P. J. Bickel, E. Levina, et al. Pseudo-likelihood methods for community 
detection in large sparse networks. The Annals of Statistics, 41(4):2097-2122, 2013. 

[6] R. Banuelos. Convolutions and approximation to the identity. Available online at 
http://www.math.purdue.edu/-banuelos/ma544-05/lectures3.pdf. 

[7] A. Barrat, M. Barthelemy, R. Pastor-Satorras, and A. Vespignani. The architecture of complex 
weighted networks. Proceedings of the National Academy of Sciences of the United States of 
America, 101(ll):3747-3752, 2004. 

[8] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities 
in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008, 
2008. 

[9] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwang. Complex networks: 
Structure and dynamics. Physics reports, 424(4): 175-308, 2006. 

[10] Y. Chen, C. Suh, and A. J Goldsmith. Information recovery from pairwise measurements: A 
Shannon-theoretic approach. arXiv preprint arXiv:1504.01369, 2015. 

[11] Y. Ghen and J. Xu. Statistical-computational tradeoffs in planted problems and submatrix lo¬ 
calization with a growing number of clusters and submatrices. arXiv preprint arXiv: 1402.1267, 
2014. 

[12] Y. Deshpande, E. Abbe, and A. Montanari. Asymptotic mutual information for the two-groups 
stochastic block model. arXiv preprint arXiv:1507.08685, 2015. 

[13] D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly 
Connected World. Gambridge University Press, New York, NY, USA, 2010. 

[14] S. E. Fienberg, M. M. Meyer, and S. S. Wasserman. Statistical analysis of multiple sociometric 
relations. Journal of the American Statistical Association, 80(389) :51-67, 1985. 

[15] G. Gao, Z. Ma, A. Y. Zhang, and H. H. Zhou. Achieving optimal misclassification proportion 
in stochastic block model. arXiv preprint arXiv:1505.03772, 2015. 

[16] A. Goldenberg, A. X. Zheng, S. E. Fienberg, and E. M. Airoldi. A survey of statistical network 
models. Found. Trends Mach. Learn., 2(2):129-233, February 2010. 

[17] B. Hajek, Y. Wu, and J. Xu. Achieving exact cluster recovery threshold via semidefinite 
programming. arXiv preprint arXiv:1412.6156, 2014. 

[18] B. Hajek, Y. Wu, and J. Xu. Achieving exact cluster recovery threshold via semidefinite 
programming: Extensions. arXiv preprint arXiv:1502.07738, 2015. 

[19] E. Hartuv and R. Shamir. A clustering algorithm based on graph connectivity. Information 
Processing Letters, 76(4-6): 175-181, 2000. 

[20] S. Heimlicher, M. Lelarge, and L. Massoulie. Gommunity detection in the labelled stochastic 
block model. arXiv preprint arXiv:1209.2910, 2012. 


30 


[21] P. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social 
Networks, 5(2):109-137, 1983. 

[22] M. O. Jackson. Social and Economic Networks. Princeton University Press, 2010. 

[23] A. W. Knapp. Basic real analysis, volume 10. Springer Science & Business Media, 2005. 

[24] M. Lelarge, L. Massoulie, and J. Xu. Reconstruction in the labeled stochastic block model. In 
Information Theory Workshop (ITW), 2013 IEEE, pages 1-5. IEEE, 2013. 

[25] L. Massoulie. Community detection thresholds and the weak Ramanujan property. In Proceed¬ 
ings of the 46th Annual ACM Symposium on Theory of Computing, STOC ’14, pages 694-703. 
ACM, 2014. 

[26] F. McSherry. Spectral partitioning of random graphs. In Foundations of Computer Science, 
2001. Proceedings. 4^'^d IEEE Symposium on, pages 529-537. IEEE, 2001. 

[27] E. Mossel, J. Neeman, and A. Sly. Stochastic Block Models and Reconstruction. arXiv preprint 
arXiv:1202. 1499 . 

[28] E. Mossel, J. Neeman, and A. Sly. A proof of the block model threshold conjecture. arXiv 
preprint arXiv:1311.4115, 2013. 

[29] E. Mossel, J. Neeman, and A. Sly. Consistency thresholds for binary symmetric block models. 
arXiv preprint arXiv:1407.1591, 2014. 

[30] M. Newman, A.-L. Barabasi, and D. J. Watts. The Structure and Dynamics of Networks: 
(Princeton Studies in Complexity). Princeton University Press, Princeton, NJ, USA, 2006. 

[31] M. E. J. Newman. Analysis of weighted networks. Physical Review E, 70(5):056131, 2004. 

[32] M. E. J. Newman and M. Girvan. Einding and evaluating community structure in networks. 
Physical review E, 69(2):026113, 2004. 

[33] J. K. Pritchard, M. Stephens, and P. Donnelly. Inference of population structure using multi¬ 
locus genotype data. Cenetics, 155(2):945-959, 2000. 

[34] M. Rubinov and O. Sporns. Complex network measures of brain connectivity: Uses and 
interpretations. Neuroimage, 52(3): 1059-1069, 2010. Computational Models of the Brain. 

[35] D.S. Sade. Sociometrics of Macaca mulatta: I. Linkages and cliques in grooming matrices. 
Folia Primatologica, 18(3-4):196-223, 1972. 

[36] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. 
Mach. Intel!, 22(8):888-905, August 2000. 

[37] T. van Erven and P. Harremoes. Renyi divergence and kullback-leibler divergence. IEEE 
Transactions on Information Theory, 60(7):3797-3820, 2014. 

[38] S. Wasserman and C. Anderson. Stochastic a posteriori blockmodels: Construction and as¬ 
sessment. Social Networks, 9(l):l-36, 1987. 

[39] H. C. White, S. A. Boorman, and R. L. Breiger. Social structure from multiple networks: 1. 
Blockmodels of roles and positions. American Journal of Sociology, 81(4):730-780, 1976. 


31 


[40] A. Y. Zhang and H. H. Zhou. Minimax rates of community detection in stochastic block model. 
arXiv preprint arXiv:1507.05313, 2015. 


32 


